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CHARACTERIZAnON OF THE YEAST TRANSCRIPTOME 

This application is a continuation-in-part of co-pending application Serial No. 
09/012,031 filed January 22, 1998, the disclosure of which is incorporated by reference 
herein. This invention was made with government support under CA57345 awarded 
by the National Institutes of Health. The government has certain rights in the 
invention. 

TECBT>nCAL FIELD OF TH E INVENTTON 

This invention is related to the characterization of the expressed genes of the 
yeast genome. More paxticulariy, it is related to the identification and use of previously 
unrecognized genes. 

BACKGROUND OF THE INVENTTON 

It is by now axiomatic that the phenotype of an organism is largely determined 
by the genes expressed within it. These expressed genes can be represented by a 
"transcriptome," conveying the identity of each expressed gene and its level of 
expression for a defined population of cells. Unlike the genome, which is essentially 
a static entity, the transcriptome can be modulated by both. external and internal 
factors. The transcriptome thereby serves as a dynamic link between an organism's 
genome and its physical characteristics. 

The transcriptome as defined above has not been characterized in any eukaryotic 
or prokaryotic organism, largely because of technological limitations. However, some 
general features of gene expression patterns were elucidated two decades ago through 
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RNA-DNA hybridization measurements (Bishop el ai, 1974; Hereford and Rosbash, 
1977). In many organisms, it was thus found that at least three classes of transcripts 
could be identified, with either high, medium, or low levels of expression, and the 
number of transcripts per cell were estimated (Lewin. 1980). These data of course 
provided little information about the specific genes that were members of each class. 
Data on the expression levels of individual genes have accumulated as new genes were 
discovered. However, in only a few instances have the absolute levels of expression 
of particular genes been measured and compared to other genes in the same cell type. 

Description of any cell's transcriptome would therefore provide new 
information useful for understanding numerous aspects of ceil biology and 
biochemistry. 

SUMMARY OF THE INVENTION 

It is an object of the present invention to provide isolated DNA molecules and 
methods of using such molecules to affect the cell cycle and identify candidate drugs. 
These and other objects of the invention are achieved by providing the art with one 
or more of the embodiments described below. 

According to one embodiment of the invention an isolated DNA molecule is 
provided. It comprises a coding sequence of a yeast gene selected from the group 
consisting of NORF genes comprising a SAGE tag as shown in SEQ ID NOS:67-81 1 . 

According to another embodiment of the invention a method of using NORF 
genes is provided. The method is for affecting the cell cycle of a cell. The method 
comprises the step of administering to a cell an isolated DNA molecule comprising 
a coding sequence of a NORF gene whose expression varies by at least 10% between 
any two phases of the cell cycle selected from the group consisting of log phase, S 
phase, and G2/M. 

In yet another embodiment of the invention a method for screening candidate 
antifungal drugs is provided. The method comprises the steps of contacting a test 
substance with a yeast cell and monitoring expression of a NORF gene whose 
expression varies by at least 10% between any two phases of the cell cycle selected 
from the group consisting of log phase, S phase, and G2/M, wherein a test substance 
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which modifies the expression of the yeast gene is a candidate antifungal drug. 

In still another embodiment of the invention a method for identifying human 
genes which are involved in cell cycle progression is provided. The method comprises 
the step of contacting human DNA with a probe which comprises at least 14 
contiguous nucleotides of a NORF gene whose expression varies by at least 10% 
between any two phases of the cell cycle selected from, the group consisting of log 
phase, S phase, and G2/M. A human DNA sequence which hybridizes to the probe 
is identified as a sequence of a candidate human gene which is involved in cell cycle 
progression. 

The present invention provides probes which comprise at least 14 contiguous 
nucleotides of a NORF gene comprising a SAGE tag as shown in SEQ ID NOS:67- 
811. 

The invention also provides an array of probes on a solid support. At least one 
probe in the array comprises at least 14 contiguous nucleotides of a NORF gene 
comprising a SAGE tag as shown in SEQ ID NOS:67-81 1 . 

Still another embodiment of the invention is a mediod of identifying a candidate 
drug as a member of a class of drugs having a characteristic effect on gene expression 
in a yeast cell. A yeast cell is contacted with a candidate drug. Expression of at least 
one NORF gene whose expression is affected by the class of drugs is monitored in the 
yeast cell. Detection of a difference in expression of the at least one NORF gene 
relative to expression in the absence of the candidate drug identifies the candidate 
drug as a member of the class of drugs. 

These and other embodiments of the invention which will be apparent to those 
of skill in the art upon reading the detailed disclosure provided below, make available 
to the art hitherto unrecognized genes, and information about the expression of genes 
globally at the organismal level. 

BRIEF DESCRIPTION OF T HE DRAWINr.5^ 

Figure 1. Schematic of SAGE Method and Genome Analysis. In applying 
SAGE to the analysis of yeast gene expression patterns, the 3' most Nlalll site was 
used to define a unique position in each transcript and to provide a site for ligation of 
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a linker with a BsmFI site. The type Ms enzyme BsmFI, which cleaves a defined 
distance from its non-palindromic recognition site, was then used to generate a 15bp 
SAGE tag (designated by the black arrows), which includes the Nlalll site. 
Automated sequencing of concatenated SAGE tags allowed the routine identification 
of about a thousand tags per 36-lane sequencing gel. Once sequenced, the abundance 
of each SAGE tag was calculated, and each tag was used to search the entire yeast 
genome to identify its corresponding gene. The lower panel shows a small region of 
Chromosome 15. Gray arrows indicate all potential SAGE tags (Nlalll sites) and 
black arrows indicate 3* most SAGE tags. The total number of tags observed for each 
potential tag is indicated above (+ strand) or below (- strand) the tag. As expected, the 
observed SAGE tags were associated with the 3* end of expressed genes. 

Figure 2« Samplingof Yeast Gene Expression. Analysis of increasing amounts 
of ascertained tags reveals a plateau in the number of unique expressed genes. 
Triangles represent genes with known functions, squares represent genes predicted on 
the basis of sequence information, and circles represent total genes. 

Figure 3. Virtual Rot. (A) Abundance Classes in the Yeast Transcriptomc. 
The transcript abimdance is plotted in reverse order on the abscissa, whereas the 
fraction of total transcripts with at least that abundance is plotted on the ordinate. The 
dotted lines identify the three components of the curve, 1 , 2, and 3. This is analogous 
to a Rot curve derived from reassociation kinetics where the product of initial RNA 
concentration and time is plotted on the abscissa, and the percent of labeled cDNA 
that hybridizes to excess mRNA is plotted on the ordinate. (B) Comparison of 
Virtual Rot and Rot Components. Transitions and data from virtual Rot components 
were calculated from the data in Figure 3 A, while data for Rot components were 
obtained from Hereford and Rosbash, 1977. 

Figure 4, Chromosomal Expression Map for S. cerevisiae. Individual yeast 
genes were positioned on each chromosome according to their open reading frame 
(ORF) start coordinates. Abundance levels of tags corresponding to each gene are 
displayed on the vertical axis, with transcription from the + strand indicated above the 
abscissa and that from the - strand indicated below. Yellow bands at ends of the 
expanded chromosome represent telomeric regions that are undertranscribed (see text 
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for details). 

Figure 5. Northern Blot Analysis of Representative Genes. TDH2/3, TEF 1 12 
and NORf 1, are expressed relatively equally in all three states (lane 1, G2/M arrested; 
lane 2, S phase arrested; lane 3, log phase), while RNR4, RNR2 , and N0RF5 are 
5 highly expressed in S-phase arrested cells. The expression level observed by SAGE 

(number of tags) is noted below each lane and was highly correlated with quantitation 
of the Northern blot by Phosphorlmager analysis (r^O.97). 

TABLE LEGENDS 

Table 1. Highly Expressed Genes. Tag represents the 10 bp SAGE tag adjacent 
1 0 to the Nlalll site; Gene represents the gene or genes corresponding to a particular tag 

(multiple genes that match unique tags are ftom related families, with an average 
identity of 93%); Locus and Description denote the locus name and functional 
description of each ORF, respectively; Copies/cell represents the abundance of each 
transcript in the SAGE library, assuming 15,000 total transcripts per cell and 60,633 
1 5 ascertained transcripts. 

Table 2, Expression of Putative Coding Sequences. Table column headings are 
the same as for Table 1 . 

Table 3. Expression of the most abundant NORF genes. SAGE Tag, Locus, 
and Copies/cell are the same as for Table 1 ; Chr and Tag Pos denote the chromosome 
20 and position of each tag; ORF Size denotes the size of the ORF corresponding to the 

indicated tag. In each case, the tag was located within or less than 250 bp 3* of the 
NORF. 

Table 4. Expression of NORF genes. SAGE tag and Copies/cell are the same 
as for Table 1. Chr and Tag Pos denote the chromosome and position of each tag. 

25 Table 5. Gene expression changes in different cell cycle phases. L denotes log 

phase; S denotes synthesis phase; G2/M denotes the mitotic phase. Tag Sequence 
represents the 10 bp SAGE tag adjacent to the Nlalll site; "ratio L to S" denotes the 
ratio of expression in log phase to expression in synthesis phase; "ratio S to G2/M" 
denotes the ratio of expression in synthesis phase to expression in G2/M phase; "ratio 

30 G2/M to L" denotes the ratio of expression in G2/M to log phase, #DI V/0! indicates 
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an increase in expression from 0; a value of 0 indicates a decrease in expression to 0; 
a value of 1 indicates no change; a value less than 1 indicates a decrease in expression; 
and a value greater than 1 indicates an increase in expression. 

Table 6. Intergenic open reading frames that contain or are adjacent to observed 
SAGE tags. Copies/cell represents abundance of each mRNA transcript as in Table 
I. Positive expression level indicates the tag is on the + strand of the chromosome; 
Negative expression level indicates the tag is on the - strand. 

DgTAfliEP DESCRIPTION 

It is a discovery of the present invention that certain hitherto unknown genes 
(the NORFs) exist and are expressed in yeast. These genes, as well as other 
previously identified and previously postulated genes, can be used to study, monitor, 
and affect phases of cell cycle. The present invention identifies which genes are 
differentially expressed during the cell cycle. Differentially expressed genes can be 
used as markers of phases of the cell cycle. They can also be used to affect a change 
in the phase of the cell cycle. In addition, they can be used to screen for drugs which 
affect the cell cycle, by affecting expression of the genes. Human homologs of these 
eukaryotic genes are also presumed to exist, and can be identified using the yeast 
genes as probes or primers to identify the human homologs. 

New genes termed NORFs (not previously assigned open reading frames) have 
been found. They are uniquely identified by their SAGE tags. In addition their entire 
nucleotide sequences are known and publicly available. In general, these were not 
previously identified as genes due to their small size. However, they have now been 
found to be expressed. 

Differentially expressed yeast genes are those whose expression varies by a 
statistically significant difference (to greater than 95% confidence level) within 
different growth phases, particularly log phase, S phase, and G2/M. Preferably the 
difference is at least 10%, 25%, 50%, or 100%. In some cases, differentially 
expressed genes are not expressed at detectable levels in one or more cell cycle phases 
as determined by SAGE analysis. Genes which have been found to have differential 
expression characteristics include: NORF 1, 2, 4, 5, 6, 17, 25, 27, TEF1/TEF2, 
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EN02, ADHl, ADH2, PGKl, CUPIA/CUPIB, PYKl, YKL056C, YMRI16C, 
YEL033W, YOR182C, YCR013C, ribonucleotide reductase 2 and 4, and YJR085C. 
Differential expression can be detected by any means known in the art, such as 
hybridization to specific probes or immunological assays. 

Isolated DNA molecules according to the invention contain less than a whole 
chromosome and can be genomic or cDNA, i.e., lacking introns. Isolated DNA 
molecules can comprise a yeast gene or a coding sequence of a yeast gene involved 
in cell cycle progression, such as NORF genes which comprise SAGE tags as shown 
in SEQ ID NOS:67-81 1. Isolated DNA molecules which comprise yeast genes or 
coding sequences of yeast genes comprising SAGE tags as shown in SEQ ID NOS:37- 
12.203 are also isolated DNA molecules of the invention. Isolated DNA molecules 
can also consist of a yeast gene or a coding sequence of a yeast gene which comprises 
a SAGE tag as shovm in SEQ ID NOS:37-12,203 or 67-81 1. 

Any technique for obtaining a DNA of known sequence may be used to obtain 
isolated DNA molecules of the invention. Preferably they are isolated free of other 
cellular components such as membrane components, proteins, and lipids. They can 
be made by a cell and isolated, or synthesized using PGR or an automatic synthesizer. 
Methods for purifying and isolating DNA are routine and are known in the art. 

To administer yeast genes to cells, any DNA delivery techniques known in the 
art may be used, without limitation. These include liposomes, transfection, mating, 
transduction, transformation, viral infection, electroporation. Vectors for particular 
purposes and characteristics can be selected by the skilled artisan for their known 
properties. Cells which can be used as gene recipients are yeast and other fungi, 
mammalian cells, including humans, and bacterial cells. 

Antifungal drugs can be identified using yeast cells as described herein. 
Expression of a differentially expressed NORF gene can be monitored by any means 
known in the art. When a test substance modifies the expression of such a 
differentially expressed gene, for example by increasing or decreasing its expression, 
it is a candidate drug for affecting the growth properties of fungi and may be usefiil 
as an antifungal agent. Expression of more than one NORF gene can be monitored. 
For example, expression of 2, 3, 4, 5, 10. 15, 20, 30, 40, 50, 60, 75, 100, 150, 250. 
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300, 350, 400, 450, or 500 or more NORF genes can be monitored in single or 
multiple assays. 

Because differentially expressed genes are likely to be involved in cell cycle 
progression, it is likely that these genes are conserved among species. The 
differentially expressed NORF genes identified by the present invention can be used 
to identify homologs in humans and other manunals by contacting DNA from these 
mammals with a probe which comprises at least 10 contiguous nucleotides of a 
differentially expressed NORF gene. The DNA can be genomic or cDNA, as is 
known in the art. Means for identifying homologous genes among different species 
are well known in the art. Briefly, stringency of hybridization can be reduced so that 
imperfectly matching sequences hybridize. This can be in the context of inter alia 
Southern blots. Northern blots, colony hybridization or PGR. Any hybridization 
technique which is known in the art can be used A DNA sequence which hybridizes 
to the probe is identified as a sequence of a candidate gene which is involved in cell 
cycle expression. 

Probes according to the present invention are isolated DNA molecules which 
have at least 10, and preferably at least 12, 14, 16, 18, 20, or 25 contiguous 
nucleotides of a particular NORF gene or other differentially expressed gene. The 
probes may or may not be labeled. They may be used, for example, as primers for 
PGR assays, or for detection of gene expression for Southern or Northern blots or in 
situ hybridization. Preferably the probes are inunobilized on a solid support. The 
solid support can be any surface to which a probe can be attached. Suitable solid 
supports include, but are not limited to, glass or plastic slides, tissue culture plates, 
microtiter wells, tubes, or particles such as beads, including but not limited to latex, 
polystyrene, or glass beads. Any method known in the art can be used to attach the 
a probe to the solid support, including use of covalent and non-covalent linkages, 
passive absorption, or pairs of binding moieties attached respectively to the probe and 
the solid support. 

More preferably, probes are present on an array so that multiple probes can 
simultaneously hybridize to a single biological sample. The probes can be spotted 
onto the array or synthesized in situ on the array. See Lockhart et. al., Nature 
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Biotechnology, Vol. 14, December 1996, "Expression monitoring by hybridization 
. to high-density oligonucleotide arrays/' A single array contains at least one NORF 
probe, but can contain more than 100, 500 or even 1 ,000 different probes in discrete 
locations. If desired, one or more NORF probe(s) present on the array can be 
5 nucleotide sequences from a NORF gene which is differentially expressed during the 

cell cycle. 

Genes identified by the present invention which are differentially expressed 
during the cell cycle can also be used to obtain gene expression profiles characteristic 
of the response of yeast genes of a yeast cell to a particular drug or class of drugs. 

10 Classes of drugs of particular interest for which gene expression profiles can be 

generated include those drugs which affect cell cycle or other cell processes, such as 
chemotherapeutic agents. If desired, gene expression profiles characteristic of more 
than one drug of a particular class can be generated and used to make a composite 
gene expression profile. For example, microtubule poison drugs such as vinblastin, 

IS taxol, vincristine, and taxotere can be used to generate gene expression profiles 

characteristic of microtubule poisons. 

To generate a gene expression profile characteristic of a particular drug or class 
of drugs, a yeast cell is contacted with a particular drug or a member of a particular 
class of drugs. Expression of at least one yeast gene is monitored, either before and 

20 after contacting or in the contacted cell and in another yeast cell Which has not been 

contacted with the drug. Genes which are monitored can be any yeast gene, including 
NORFS. Preferably, these genes are differentially expressed during the cell cycle. For 
example, yeast genes can be selected fi-om genes comprising the SAGE tags shown 
in Tables 3, 4, 5, and 6 (SEQ ID NOS:67- 12,203). If desired, genes such as NORF N« 

25 1, 2, 4, 5, 6, 17, 25, or 27, TEF1/TEF2, EN02, ADHl, ADH2, PGKl, 

CUPIA/CUPIB, PYKl. YKL056C, YMR116C, YEL033W, YOR182C, YCR013C, 
ribonucleotide reductase 2 and 4, and YJR085C, can be used for monitoring 
alterations in gene expression. 

The expression of any number of these genes, such as 1, 2, 3, 4, 5, 10, 15, 20, 

30 25, 30, 40, 50, 60, 75, 100, 150, 250, 500, 1000, 2000, 3000, 4000, 5000, or 5,500 

genes, can be measured. It is particularly convenient to monitor expression of the 
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differentially expressed genes using nucleic acids which are immobilized on a solid 
support or in an array, such as the gene arrays described above. 

Many genes, particularly cell cycle genes, are likely to be conserved between 
yeast and mammals, including humans. Thus, gene expression profiles characteristic 
of a drug or class of drugs can be used to predict the effects of candidate drugs on 
human cells, by identifying the candidate drug as a member of a class of drugs whose 
characteristic gene expression profile is known. The candidate drugs can be 
pharmacologic agents already known in the art or can be compounds previously 
unknown to have any pharmacological activity. The candidate drugs can be naturally 
occurring or designed in the laboratory. They can be isolated firom microorganisms, 
animals, or plants, and can be produced recombinantly or synthesized by chemical 
methods known in the art. 

The effect of a candidate drug on expression of at least one gene whose 
expression is affected by the class of drugs is monitored. A gene expression profile 
obtained using the candidate drug which is similar to a gene expression profile for a 
particular drug or class of drugs identifies the candidate drug as a member of that class 
of drugs. 

The effect of modifying particular substituents of a known drug or of a candidate 
drug can be similarly tested. Such methods are useful for determining whether 
alterations intended, for example, to increase solubility or absorption of a particular 
drug will have an unintended and possibly deleterious effect on genes which are 
differentially expressed during the cell cycle. 

The above disclosure generally describes the present invention. A more 
complete understanding can be obtained by reference to the following specific 
examples which are provided herein for purposes of illustration only, and are not 
intended to limit the scope of the invention. 

E XAMPL E 

Summary 

We have analyzed the set of genes expressed from the yeast genome, herein 
called the transcriptome, using serial analysis of gene expression (SAGE). Analysis 
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of 60,633 transcripts revealed 4,665 genes, with expression levels ranging from 0.3 
to over 200 transcripts per cell. Of these genes, 1,981 had known functions, while 
2,684 were previously uncharacterized. Integration of positional information with 
gene expression data allowed the generation of chromosomal expression maps, 
identifying physical regions of transcriptional activity, and identified genes that had 
not been predicted by sequence information alone. These studies provide insight into 
global patterns of gene expression in yeast and demonstrate the feasibility of genome- 
wide expression studies in eukaryotes. 

Results 

Characteristics and Rationale of SAGE Approach 

Several methods have recently been described for the high throughput 
evaluation of gene expression (Nguyen et aL, 1995; Schena et aL, 1995; Velculescu 
et a/., 1995). We used SAGE (Serial Analysis of Gene Expression) because it can 
provide quantitative gene expression data without the prerequisite of a hybridization 
probe for each transcript The SAGE technology is based on two basic principles 
(Figure 1). First, a short sequence tag (9-1 1 bp) contains sufficient information to 
uniquely identify a transcript, provided that it is derived from a defined location within 
that transcript. Second, many transcript tags can be concatenated into a single 
molecule and then sequenced, revealing the identity of multiple tags simultaneously. 
The expression pattern of any population of transcripts can be quantitatively evaluated 
by determining the abundance of individual tags and identifying the gene 
corresponding to each tag. 

Genome*wide expression 

In order to maximize representation of genes involved in normal growth and cell-cycle 
progression, SAGE libraries were generated from yeast cells in three states: log phase, 
S phase arrested and G2/M phase arrested. In total, SAGE tags corresponding to 
60.633 total transcripts were identified (including 20,184 from log phase, 20,034 from 
S phase arrested, and 20,415 from G2/M phase arrested cells). Of these tags, 56,291 
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lags (93%) precisely matched the yeast genome, 88 tags matched the mitochondrial 
genome, and 91 tags matched the 2 micron plasmid. 

The number of SAGE tags required to define a yeast transcriptome depends on 
the confidence level desired for detecting low abundance mRNA molecules. 
Assuming the previously derived estimate of 15,000 mRNA molecules per cell 
(Hereford and Rosbash, 1977), 20,000 tags would represent a 1.3 fold coverage even 
for mRNA molecules present at a single copy per cell, and would provide a 72% 
probability of detecting such transcripts (as detennined by Monte Carlo simulations). 
Analysis of 20,184 tags from log phase cells identified 3,298 unique genes. As an 
independent confirmation of mRNA copy number per cell, we compared the 
expression level of SUP44/RPS4, one of the few genes whose absolute mRNA levels 
have been reliably determined by quantitative hybridization experiments (Iyer and 
Struhl, 1996), with expression levels determined by SAGE. SUP44/RPS4 was 
measured by hybridization at 75 +/- 10 copies/cell (Iyer and Struhl, 1996), in good 
accord with the SAGE data of 63 copies/cell, suggesting that the estimate of 15,000 
mRNA molecules per cell was reasonably accurate. Analysis of SAGE tags from S 
phase arrested and G2/M phase arrested cells revealed similar expression levels for 
this gene (range 52 to 55 copies/cell), as well as for the vast majority of expressed 
genes. As less than 1% of the genes were expressed at dramatically different levels 
among these three states (see below), SAGE tags obtained from all libraries were 
combined and used to analyze global patterns of gene expression. 

Analysis of ascertained tags at increasing increments revealed that the number 
of unique transcripts plateaued at --60,000 tags (Figure 2). This suggested that 
generation of further SAGE tags would yield few additional genes, consistent with the 
fact that sixty thousand transcripts represented a four-fold redundancy for genes 
expressed as low as 1 transcript per cell. Likewise, Monte Carlo simulations indicated 
that analysis of 60,000 tags would identify at least one tag for a given transcript 97% 
of the time if its expression level was one copy per cell. 

The 56,291 tags that precisely matched the yeast genome represented 4,665 
different genes. This number is in agreement with the estimate of 3,000 to 4,000 
expressed genes obtained by RNA-DNA reassoctation kinetics (Hereford and 
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Rosbash, 1977). These expressed genes included 85% of the genes with characterized 
functions (1,981 of 2,340), and 76% of the total genes predicted from analysis of the 
yeast genome (4,665 of 6,121). These numbers are consistent with a relatively 
complete sampling of the yeast transcriptome given the limited number of 
physiological states examined and the large number of genes predicted solely on the 
basis of genomic sequence analysis. 

The transcript expression per gene was observed to vary from 0.3 to over 200 
copies per cell. Analysis of the distribution of gene expression levels revealed several 
abundance classes that were similar to those observed in previous studies using 
reassociation kinetics. A "virtual Rot" of the genes observed by SAGE (Figure 3A) 
identified three main components of the transcriptome with abundances ranging over 
three orders of magnitude. A Rot curve derived from RNA-cDNA reassociation 
kinetics also contained three main components distributed over a similar range of 
abundances (Hereford and Rosbash, 1977). Although the kinetics of reassociation of 
a particular class of RNA and cDNA may be affected by numerous experimental 
variables, there were striking similarities between Rot and virtual Rot analyses (Figure 
3B). Because Rot analysis may not detect all transcripts of low abundance (Lewin, 
1980), it is not surprising that SAGE revealed both a larger total number of expressed 
genes and a higher fraction of the transcriptome belonging to the low abundance 
transcript class. 

Integration of Expression Information with the Genomic Map 

The SAGE expression data could be integrated with existing positional information 
to generate chromosomal expression maps (Figure 4). These maps were generated 
using the sequence of the yeast genome and the position coordinates of ORFs obtained 
from the Stanford Yeast Genome Database. Although there were a few genes that 
were noted to be physically proximal and have similarly high levels of expression, 
there did not appear to be any clusters of particularly high or low expression on any 
chromosome. Genes like histones H3 and H4, which are known to have coregulated 
divergent promoters and are immediately adjacent on chromosome 14 (Smith and 
Murray, 1983), had very similar expression levels (5 and 6 copies per cell. 
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respectively). The distribution of transcripts among the chromosomes suggested that 
overall transcription was evenly dispersed, with total transcript levels being roughly 
linearly related to chromosome size =0.85, data not shown). However, regions 
within 10 kb of telomeres appeared to be uniformly undertranscribed, containing on 
average 3.2 tags per gene as compared with 12.4 tags per gene for non-telomeric 
regions (Figure 4), This is consistent with the previously described observations of 
"telomeric silencing" in yeast (Gottschling et aL, 1990). Recent studies have reported 
lelomeric position effects as far as 4 kb from telomere ends (Renauld et aL, 1993). 

Gene Expression Patterns 

Table 1 lists the 30 most highly expressed genes, aU of which are expressed at 
greater than 60 mRNA copies per cell. As expected, these genes mostly correspond 
to well characterized enzymes involved in energy metabolism and protein synthesis 
and were expressed at similar levels in all three growth states (Examples in Figure 5). 
Some of these genes, including EN02 (McAlister and Holland, 1 982), PDCl (Schmitt 
et al., 1983), PGKl (Chambers et aL, 1989), PYKl (Nishizawa et a/., 1989), and 
ADHI (Denis et aL, 1983), are known to be dramatically induced in the glucose-rich 
growth conditions used in this study. In contrast, glucose repressible genes such as 
the GAL 1/GAL7/G ALIO cluster (St John and Davis, 1979), and GALS (Bajwa et aL, 
1988) were observed to be expressed at very low levels (0.3 or fewer copies per cell). 

As expected for the yeast strain used in diis study, mating type a specific genes, 
such as the a factor genes {MFAl, MFA2) (Michaelis and Herskowitz, 1988), and 
alpha factor receptor {STE2) (Burkholder and Haitwell, 1985) were all observed to be 
expressed at significant levels (range 2 to 10 copies per cell), while mating type alpha 
specific genes {MFal, MFa2, STE3) (Hagen et aL, 1986; Kurjan and Herskowitz, 
1982; Singh et aL, 1983) were observed to be expressed at very low levels (<0.3 
copies/cell). 

Three of the highly expressed genes in Table 1 had not been previously 
characterized. One contained an ORF with predicted ribosomal fimction, previously 
identified only by genomic sequence analysis. Analyses of all SAGE data suggested 
that there were 2,684 such genes corresponding to uncharacterized ORFs which were 
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transcribed at detectable levels. The 30 most abundant of these transcripts were 
observed more than 30 times, corresponding to at least 8 transcripts per cell (Table 2). 
The other two highly expressed uncharacterized genes corresponded to ORFs not 
predicted by analysis of the yeast genome sequence (NORF = Monannotated QRF V 
Analyses of SAGE data suggested that there were at least 1 60 NORF genes transcribed 
at detectable levels. The 30 most abundant of these transcripts were observed at least 
9 times (Table 3 and examples in Figure 5). 

Interestingly, one of the NORF genes (N0RF5) was only expressed in S phase 
arrested cells and corresponded to the transcript whose abundance varied the most in 
the three states analyzed (> 49 fold. Figure 5). Comparison of S phase arrested cells 
to the other states also identified greater than 9 fold elevation of the RNR2 and RNR4 
transcripts (Figure S). Induction of these ribonucleoside reductase genes is likely to 
be due to the hydroxyurea treatment used to arrest cells in S phase (EUedge and Davis, 
1 989). Likewise, comparison of G2/M arrested cells identified elevation of RBL2 
and dynein light chain, both microtubule associated proteins (Archer et ai, 1995; Dick 
et aL, 1996). As with the RNR inductions, these elevated levels seem likely to be 
related to the nocodazole treatment used to arrest cells in the G2/M phase. While 
there were many relatively small differences between the states (for example, NORFl, 
Figure 5), overall comparison of the three states revealed surprisingly few dramatic 
differences; there were only 29 transcripts whose abundance varied more than 10 fold 
among the three different states analyzed (Tables 4 and 5). 

A comprehensive analysis for NORF genes was performed using the SAGE 
data. Yeast genome intergenic regions were defined as regions outside annotated 
ORFs or the SOObp region downstream of annotated ORFs (yeast genome sequence 
and tables of annotated ORFs were obtained from SGD at 
http://genome-www.stanford.edu/Saccharomyces/). Based on sequence analysis a 
total of 9524 putative ORFs of 25-99 amino acids were present in the intergenic 
regions; 510 of these ORFs contain or are adjacent to observed SAGE tags (Table 6). 
Of the 60,633 SAGE tags analyzed, there were 302 unique SAGE tags either within 
or adjacent to intergenic ORFs (lOObp upstream or SOObp downstream of the ORF) 
(Table 6). Note that in some cases, more than one NORF contains or is adjacent to the 
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SAGE tag. These tags matched the genome uniquely, were in the correct orientation, 
and were expressed at levels greater than 0.3 transcript copies per cell. 

The expression level for each NORF shown in Table 6 corresponds to the 
number of mRNA transcript copies per cell. If the expression level is positive it 
means that the tag is on the + strand of the chromosome; if negative, the tag is on the 
- strand of the chromosome. 

Discussion 

Analysis of a yeast transcriptome affords a unique view of the RNA components 
defining cellular life. Comparison of gene expression patterns from altered 
physiologic states can provide insight into genes that are important in a variety of 
processes. Comparison of transcriptomes from a variety of physiologic states should 
provide a minimum set of genes whose expression is required for normal vegetative 
growth, and another set composed of genes that will be expressed only in response to 
specific environmental stimuli, or during specialized processes. For example, recent 
work has defined a minimal set of 250 genes required for prokaryotic cellular life 
(Mushegian and Koonin, 1996). Examination of the yeast genome readily identified 
homologous genes for 1 96 of these, over 90% of which were observed to be expressed 
in the SAGE analysis. Detailed analyses of yeast transcriptomes, as well as 
transcriptomes from other organisms, should ultimately allow the generation of a 
minimal set of genes required for eukaryotic life. 

Like other genome-wide analyses, SAGE analysis of yeast transcriptomes has 
several potential limitations. First, a small number of transcripts would be expected 
to lack an Nlalll site and therefore would not be detected by our analysis. Second, our 
analysis was limited to transcripts found at least as frequently as 0.3 copies per cell. 
Transcripts expressed in only a minute fraction of the cell cycle, or transcripts 
expressed in only a fraction of the cell population, would not be reliably detected by 
our analysis. Finally, mRNA sequence data are practically unavailable for yeast, and 
consequently, some SAGE tags cannot be unambiguously matched to corresponding 
genes. Tags which were derived from overiapping genes, or genes which have 
unusually long 3* untranslated regions may be misassigned. Increased availability of 
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3' UTR sequences in yeast mRNA molecules should help to resolve the ambiguities. 

Despite these potential limitations, it is clear that the analyses described here 
furnish both global and local pictures of gene expression, precisely defined at the 
nucleotide level. These data, like the sequence of the yeast genome itself, provide 
simple, basic information integral to the interpretation of many experiments in the 
future. The availability of mRNA sequence information from EST sequencing as 
well as various genome projects, vnl\ soon allow definition of transcriptomes from a 
variety of organisms, including human. The data recorded here suggest that a 
reasonably complete picture of a human cell transcriptome will require only about 10 - 
20 fold more tags than evaluated here, a number well within the practical realm 
achievable with a small number of automated sequencers. The analysis of global 
expression patterns in higher eukaiyoies is expected, in general, to be similar to those 
reported here for S. cerevisiae. However, the analysis of the transcriptome in different 
cells and from different individuals should yield a wealth of information regarding 
gene function in normal, developmental, and disease states. 

Experimental Procedures 
Yeast cell culture 

The source of transcripts for all experiments was S. cerevisiae strain YPH499 
(MATa ura3'52 lys2-80] ade2-101 Ieu2-Al his3'A200 trpl-A63) (Sikorski and 
Hieter, 1989). Logarithmically growing cells were obtained by growing yeast cells to 
early log phase (3x10* cells/ml) in YPD (Rose et aL, 1990) rich medium (YPD 
supplemented with 6 mM uracil, 4.8 mM adenine and 24 mM tryptophan) at SO'^C. 
For arrest in the Gl/S phase of the cell cycle, hydroxyurea (0. 1 M) was added to early 
log phase cells, and the culture was incubated an additional 3.5 hours at 30°C. For 
arrest in the G2/M phase of the cell cycle, nocodazole (15 ng/ml) was added to early 
log phase cells and the culture was incubated for an additional 100 minutes at 30*'C. 
Harvested cells were washed once with water prior to freezing at -TO^'C. The growth 
states of the harvested cells were confinned by microscopic and flow cytometric 
analyses (Basrai e/fl/., 1996). 
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SAGE protocol 

The SAGE method was performed as previously described (Velculescu et ai, 
1995; KJnzler et ai, U.S. Patents 5,866,330 and 5,695,937), with exceptions noted 
below. PolyA RNA was converted to double-stranded cDNA with a BRL synthesis 
kit using the manufacturer's protocol except for the inclusion of primer biotin-5'-T„- 
3'. The cDNA was cleaved with Nlalll (Anchoring Enzyme). As Nlalll sites were 
observed to occur once every 309 base pairs in three arbitrarily chosen yeast 
chromosomes (1, 5, 10), 95% of yeast transcripts were predicted to be detectable with 
a Nlalll-based SAGE approach. After capture of the 3* cDNA fragments on 
streptavidin coated magnetic beads (Dynal). the bound cDNA was divided into two 
pools, and one of the following linkers containing recognition sites for BsmFI was 
ligated to each pool: Linker 1, 5'. 

TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3' (SEQ 
ID NO: 1).5*-TCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC 
[amino mod. C7]-3'(SEQ ID N0:2).; Linker 2,5'- 
TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3' (SEQ ID 

N0:3) , 5'-TCCCCGTACATCGTTAGAAGCTTGAATTCGAGCAG[amino mod. 
C7]-3' (SEQ ID N0:4). 

As BsmFI (Tagging Enzyme) cleaves 14 bp away from its recognition site, and 
the Nlalll site overlaps the BsmFI site by 1 bp, a 1 5 bp SAGE tag was released with 
BsmFI. SAGE tag overhangs were fiUed-in with Klenow, and tags from the two pools 
were combined and ligated to each other. The ligation product was diluted and then 
amplified with PGR for 28 cycles with 5'.GGATTTGCTGGTGCAGTACA-3' (SEQ 
ID NO:5) and 5'-CTGCTCGAATTCAAGCrrCT-3' (SEQ ID NO:6), as primers. The 
PGR product was analyzed by polyacrylamide gel electrophoresis (PAGE), and the 
PGR product containing two tags ligated tail to tail (ditag) was excised. The PGR 
product was then cleaved with Nlalll, and the band containing the ditags was excised 
and self-ligated. After ligation, the concatenated products were separated by PAGE 
and products between 500 bp and 2 kb were excised. These products were cloned into 
the SphI site of pZero (Invitrogen). Colonies were screened for inserts by PGR with 
Ml 3 forward and M 1 3 reverse sequences located outside the cloning site as primers. 
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PCR prcxiucts from selected clones were sequenced with the TaqFS DyePrimer 
kits (Perkin Elmer) and analyzed using a 377 ABI automated sequencer (Perkin 
Elmer), following the manufacturer's protocol. Each successful sequencing reaction 
identified an average of 26 tags; given a 90% sequencing reaction success rate, this 
corresponded to ah average of about 850 tags per sequencing gel. 

SAGE data analysis 

Sequence files were analyzed by means of the SAGE program group 
(Velculescu et al., 1995), which identifies the anchoring enzyme site with the proper 
spacing and extracts the two intervening tags and records them in a database. The 
68,691 tags obtained contained 62,965 tags from unique ditags and 5,726 tags from 
repeated ditags. The latter were counted only once to eliminate potential PCR bias of 
the quantitation, as described (Velculescu et aL, 1995). Of 62.965 tags, 2,332 tags 
corresponded to linker sequences, and were excluded from fiuther analysis. Of the 
remaining tags, 4,342 tags could not be assigned, and were likely due to sequencing 
errors (in the tags or in the yeast genomic sequence). If all of these were due to tag 
sequencing errors, this corresponds to a sequencing error rate of about 0.7% per base 
pair (for a lObp tag), not far from what we would have expected under our automated 
sequencing conditions. However, some unassigned tags had a much higher than 
expected frequency of A's as the last five base pairs of the tag (5 of the 52 most 
abundant unassigned tags), suggesting that these tags were derived from transcripts 
containing anchoring enzyme sites within several base pairs from their polyA tails. 
Given the frequency of Nlalll sites in the genome (one in 309 base pairs), 
approximately 3% of transcripts were predicted to contain Nlalll sites within 10 bp of 
their i)olyA tails. 

As very sparse data are available for yeast mRNA sequences and efforts to date 
have not been able to identify a highly conserved polyadenylation signal (Imiger and 
Braus, 1994; Zaret and Sherman, 1982), we used 14 bp of SAGE tags (i.e. the Nlalll 
site plus the adjacent 10 bp) to search the yeast genome directly (yeast genome 
sequence obtained from the Stanford yeast genome ftp site (genome-ftp.stanford.edu) 
on August 7, 1996). Because only coding regions are annotated in the yeast genome. 
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and SAGE tags can be derived from 3' untranslated regions of genes, a SAGE tag was 
considered to conrespond to a particular gene if it matched the ORF or the region 500 
bp 3' of the ORF (locus names, gene names and ORF chromosomal coordinates were 
obtained from Stanford yeast genome ftp site, and ORF descriptions were obtained 
from MIPS www site (http://www.mips.biochem.mpg.de/) on August 14, 1996). 
ORFs were considered genes with known fiinctions if they were associated with a 
three letter gene name, while ORFs without such designations were considered 
uncharacterized. 

As expected, SAGE tags matched transcribed portions of the genome in a highly 
non-random fashion, with 88% matching ORFs or their adjacent 3* regions in the 
correct orientation (chi-squared P value <1 0"^. In instances when more than one tag 
matched a particular ORF in the correct orientation, the abundance was calculated to 
be the sum of the matched tags. Tags that matched ORFs in the incorrect orientation 
were not used in abundance calculations. In instances when a tag matched more than 
one region of the genome (for example an ORF and non-ORF region) only the 
matched ORF was considered. In some cases the 15th base of the tag could also be 
used to resolve ambiguities. 

For the identification of NORF genes, only tags were considered that matched 
portions of the genome that were further than 500 bp 3* of a previously identified ORF 
and were observed at least two times in the SAGE libraries. 
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SAGc Tag 


Seq. ID 


GGCGCAATTT 


97 


TAAGTGATGA 


98 


TTGTTGAATT 


99 


GAAGCAGTAA 


100 


ACATATGTTA 


101 


CCCTACACGG 


102 


GTAATTGGAC 


103 


ATCAGACAAA 


104 


TTATGAAAGA 


105 


ATTCGTrCTA 


106 


AGCAGGAGTT 


107 


TTCTATTAGG 


106 


TGGATTTCAG 


109 


CAGATATAAT 


110 


CTGTTTTGGG 


111 


CAI 1 M lAGT 


112 


TTGAAAAGAT 


113 


TAAGCCCATC 


114 


AGCGTCCTCA 


115 


TTTAGTTAAT 


116 


ATGGTAGCCA 


117 


AATTAGACTA 


118 


AGTGACTCTT 


110 


GGACTATAAG 


120 


ACTTTTTCAG 


121 


GTCATATAGT 


122 


CAACAAAGTG 


123 


GTGGGAAAGG 


124 


TACTTTATAT 


125 


AATACCAGCG 


126 


GCCTTGTATA 


127 


GGTACATTCA 


128 


GAT7TCTCTG 


129 


TAGTTGCTCC 


130 


GTAAGAAATC 


131 


CTTGGGCTAT 


132 


AAATGGTGAT 


133 


ATCATTTGGG 


134 


CTGAACTTTA 


135 


CCAGAAGGAG 


136 


CCGGTTACTA 


137 


CGATGAGAAG 


138 


AAACCGTCCC 


139 


TCATTCATAC 


140 


TATCI 1 1 MG 


141 


TTAGAATAAT 


142 


GTACGCTGTG 


143 


TATATTAATT 


144 



Chr 


Tag Po8 


Copies 


4 


1108395 


2 


7 


593382 


2 


10 


608373 


2 


3 


155607 


2 


4 


916112 


2 


6 


223289 


2 


10 


392099 


2 


14 


687272 


2 


15 


81263 


2 


15 


841970 


2 


16 


188350 


2 


2 


418749 


2 


4 


1224930 


2 


5 


52488 


2 


11 


374761 


2 


11 


508212 


2 


13 


104160 


2 


13 


251273 


2 


15 


832420 


2 


2 


477623 


2 


3 


56961 


2 


3 


162589 


2 


4 


1490879 


2 


5 


251266 


2 


10 


159213 


2 


13 


158765 


2 


13 


171166 


2 


13 


804600 


2 


16 


366449 


2 


3 


175540 


1 


4 


372624 


1 


5 


67152 


-) 


5 


187462 


1 


7 


317108 


^ 


7 


836202 


1 


8 


107992 


1 


11 


558686 


1 


12 


199358 


1 


12 


283720 




13 


652873 




15 


803663 




15 


1004369 




16 


199141 




2 


164728 




4 


169784 




4 


603508 




5 


118089 




6 


64228 
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GTTCTTGCCT 145 

ATATAGCTGC 146 

CCAAAAAAAA 147 

GAACTCCACA 148 

CCTTCACTGC 149 

CACATCATAA 150 

GAAGTATTGA 151 

TGCGCGTATA - 152 

GGGTAGTACT 153 

TAGTTTTGTC 154 

CAATTCCTAC 155 

TTTGATTTGA 156 

GGCTCTGGTT 167 

CAGAAATAGC 158 

CTGTTATTrr 169 

CGAAGTCAAA 160 

CTCTAGATAA 161 

AGTCAAAATG 162 

GCGAGTTTAG 163 

GCTCCAATAG 164 

TTTATTTGAG 165 

GTTATATTGA 166 

TGGGTTGAAG 167 

ATTTTATTTG 168 

ATCATAAAAA 169 

TTATATAAAA 170 

CTACTTCTGC 171 

ATAAGACAGT 172 

TTCATAAGTT 173 

TAAATCTGAG 174 

CTGGTAGAAA 175 

CACGTACACA 176 

CCAAGATCAA 177 

AGCTTGTTCC 178 

CACATTCGTT 179 

CTTACATATA 180 

TCTATAGCAA 181 

CCTTTCTGAA 182 

CCTTTAGAAT 183 

AATTAACACC 184 

GCGCAGGGGC 185 

TGTTTATAAA 186 

AAAAGTCATT 187 

TTCGTAAACT 188 

I I I 1 IGGAGT 189 

AGGCATCTTG 190 

AAATCAAAAC 191 

AATTGACGAA 192 

TTGATGATTT 193 

CCTGT TTTTG 194 

TTTTTAAAAA 195 



7 

f 




A 
1 


in 

lU 


4 AAA 

loi i44 


1 


1 1 


91785 


A 
1 


I 1 


94125 


1 


* 4 

II 


374172 


1 


A 4 

1 1 


625896 


1 


40 

12 


603999 


1 


4 0 
13 


206410 


1 


4 0 
13 


671730 


1 


AC 

15 


33475 


1 


1 


172182 


0.8 


Z 


46431 


0.8 


2 


414510 


0.8 




565130 


0.8 


2 


616054 


0.8 


2 


680605 


0.8 


0 


171584 


0.8 


4 


192750 


0.8 


4 


691301 


0.8 


4 


1131020 


0.8 


4 


1237501 


0.8 


4 


1401803 


0.8 


5 


251266 


0.8 


5 


447729 


0.8 


5 


548612 


0.8 


6 


223182 


0.8 


8 


34653 


0.8 


A /\ 

10 


227B0Z 


0.8 


10 


471894 


0.8 


11 


146617 


0.8 


11 


151174 


0.8 


11 


403208 


0.8 


11 


425882 


0.8 


A ^ 

12 


234966 


0.8 


4 0 

12 


759953 


0,8 


4 0 
12 


789781 


0.8 


13 


228936 


0.8 


13 


, 297985 


0.8 


13 


777999 


0.8 


13 


842122 


0.8 


14 


440984 


0.8 


14 


661710 


0.8 


15 


32081 


0.8 


15 


680625 


0.8. 


15 


888343 


0.8 


16 


250284 


0.8 


16 


453890 


0.8 


16 


560169 


0.8 


16 


582360 


0.8 


16 


643476 


0.8 


1 


101436 


0.5 
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AAGTTTGATC 


196 


AGCACCTATG 


197 


TGATTTATCC 


198 


ACTGCATCTG 


199 


CAAGTTAGGA 


200 


ATACCCAATT 


201 


AACTTTGTAT 


202 


GCGGCGGGTG 


203 


AAAATTGTTC 


204 


TCAAGTACTC 


205 


AACTGTATGC 


206 


CTATCGGCCA 


207 


A A A ^^^^^^^^ A A 

ACAAGCCCAA 


208 


GTACAGGGCT 


209 


A A A A ^^^^ 

AAGATCATCG 


210 


GAACTCCTGG 


211 


A A ^^^^ A A A 

GAACGAGAAG 


212 


MM lAATAC 


213 


TCTCCAGTTG 


214 


AATACGTTAC 


215 


ACGATTGGCT 


216 


TGTTTATAAG 


217 


CGTTTTCGTC 


218 


TCGAACCTCT 


219 


TCCACACACA 


220 


CCGTGCGTGC 


221 


TTTCTTCAAC 


222 


CCAAGTCTCG 


223 


AGAGCGAATT 


224 


TGTAGATTAT 


225 


AAAAGTAGTT 


226 


ACTTGGTATG 


227 


TTAATGTTAT 


228 


TACACGCGCG 


229 


GGTCACTCCT 


230 


AAGTGATGAA 


231 


TTTATCTTGT 


232 


AGTGATTGTT 


233 


GCTTTGTTGT 


234 


TCATTGATTC 


235 


TTCACCGGAA 


236 


ACTATTCTGT 


237 


GGGCCAACCC 


238 


AAAATATCTT 


239 


TAGTAGTAAC 


240 


AAGCGCACAA 


241 


TCGCTGTTTT 


242 


TGTAI MUG 


243 


CTAAACAAAG 


244 


TAGGAAGAAA 


245 


GGAAAAATTA 


246 



1 


199848 


0,5 


2 


46913 


0.5 


2 


418946 


0.5 


2 


680860 


0.5 


2 


744770 


0.5 


3 


29939 


0.5 


3 


30056 


0.5 


3 


41645 


0.5 


3 


57108 


0.5 


3 


157856 


0.5 


3 


223682 


0.5 


3 


278840 


0.5 


3 


289917 


0.5 


4 


93873 


0.5 


4 


254851 


0.5 


4 


340891 


0.5 


4 


371850 


0.5 


4 


372058 


0.5 


4 


381712 


0.5 


4 


471791 


0.5 


4 


509158 


0.5 


4 


521709 


0.5 


4 


538839 


0.5 


4 


578702 


0.5 


4 


930972 


0.5 


4 


1324367 


0.5 


5 


116099 


0.5 


5 


159320 


0.5 


5 


207517 


0.5 


5 


280465 


0.5 


5 


286387 


0.5 


5 


422942 


0.5 


5 


544523 


0.5 


5 


544555 


0.5 


6 


62983 


0.5 


6 


76141 


0.5 


6 


130327 


0.5 


6 


256223 


0.5 


7 


72577 


a5 


7 


110590 


0.5 


7 


323655 


0.5 


7 


423957 


0.5 


7 


433787 


0.5 


7 


559397 


0.5 


7 


622201 


0.5 


7 


735909 


0.5 


7 


800300 


0.5 


7 


836202 


0.5 


7 


836587 


0.5 


7 


905046 


0.5. 


7 


958839 


0.5 
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III \y\Jf^ i r\\J 1 




III W 1 W 1 


Z4o 


AGAAAAAAAP 


249 


TAAAGTCPA(^ 


250 


TAAGPAGATT 


251 


ATGAGPATTT 




AGGTGCAAAA 




TAACAAAGArr • 


Z04 


CAATTGGCAA 


^DO 


ACTCCCTGTA 




CTCTATTGAT 


OCT 


GCTTTCCTTT 
^^x^ III III 




ACCGPAAAGA 


259 


rTTGTTOAAA 


260 


AATGTriPTrST 

1 w 1 v7w 1 V9 1 


261 


GnAGATAi^r^rs 


262 
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CLAIMS 

1. An isolated DNA molecule comprising a coding sequence of a yeast 
gene selected from the group of NORF genes comprising a SAGE tag as shown in 
SEQ ID NOS:67.811. 

2. The isolated DNA molecule of claim 1 which is involved in cell cycle 
progression. 

3. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 10% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

4. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 25% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

5. The isolated DNA molecule of claim 2 wherein expression of the NORP 
gene varies by at least 50% between any two phases of the cell cycle selected from the 
group consisting of: log phase, S phase, and G2/M. 

6. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by at least 100% between any two phases of the cell cycle selected from 
the group consisting of: log phase, S phase, and G2/M. 

7. The isolated DNA molecule of claim 2 wherein expression of the NORF 
gene varies by a statistically significant difference (greater than 95% confidence level) 
between any two phases of the cell cycle selected from the group consisting of: log 
phase, S phase, and G2/M. 

8. The isolated DNA molecule of claim 7 wherein the NORF gene is 
selected from the group consisting of NORF NM, 2, 4, 5, 6, 17, 25, and 27. 

9. The isolated DNA molecule of claim 2 wherein the NORF gene is not 
expressed in at least one phase of the cell cycle selected from the group consisting of: 
log phase, S phase, and G2/M. 

10. The isolated DNA molecule of claim 1 which is genomic. 

11. The isolated DNA molecule of claim I which is cDNA. 

12. A method of using NORF genes to affect the cell cycle, comprising the 
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Step of: 

administering to a cell an isolated DNA molecule comprising a coding 
sequence of a NORF gene whose expression varies by at least 10% between any two 
phases of the cell cycle selected from the group consisting of log phase, S phase, and 
5 G2/M. 

13. The method of claim 12 wherein the cell is a yeast cell. 

1 4. The method of claim 1 2 wherein the cell is a fungal cell. 

1 5. The method of claim 12 wherein the cell is a mammalian cell. 

16. The method of claim 12 wherein the NORF gene is selected from the 
10 group consisting of NORF NM, 2, 4, 5, 6, 17, 25, and 27. 

1 7. A method for screening candidate antifungal drugs, comprising the steps 

of: 

contacting a test substance with a yeast cell; 

monitoring expression of a NORF gene whose expression varies by at 
1 5 least 1 0% between any two phases of the cell cycle selected from the group consisting 

of log phase, S phase, and G2/M, wherein a test substance which modifies the 
expression of the yeast gene is a candidate antifungal drug. 

18. The method of claim 17 wherein the NORF gene is selected from the 
group consisting of NORF N** 1, 2, 4, 5, 6, 17, 25, and 27. 

20 1 9. A method for identifying human genes which are involved in cell cycle 

progression, comprising the steps of: 

contacting human DNA with a probe which comprises at least 10 

contiguous nucleotides of a NORF gene whose expression varies by at least 10% 

between any two phases of the cell cycle selected from the group consisting of log 
25 phase, S phase, and G2/M phase, wherein a human DNA sequence which hybridizes 

to the probe is identified as a sequence of a candidate human gene which is involved 

in cell cycle progression. 

20. The method of claim 19 wherein the NORF gene is selected from the 

group consisting of NORF NM. 2, 4, 5, 6, 17. 25, and 27. 
30 2 1 . A probe comprising at least 14 contiguous nucleotides of a NORF gene 

comprising a SAGE tag as shov^ in SEQ ID NOS:67-81 1. 
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22. The probe of claim 2 1 wherein expression of the NORF gene varies by 
at least 10% between any two phases of a cell cycle selected from the group consisting 
of: log phase, S phase, and G2/M. 

23. The probe of claim 22 wherein expression of the NORP gene varies by 
5 at least 25% between any two phases of the cell cycle selected from the group 

consisting of: log phase, S phase, and G2/M. 

24. The probe of claim 22 wherein expression of the NORF gene varies by 
at least 50% between any two phases of the cell cycle selected from the group 
consisting of: log phase, S phase, and G2/M. 

10 25. The probe of claim 22 wherein expression of the NORF gene varies by 

at least 100% between any two phases of the cell cycle selected from the group 
consisting of: log phase, S phase, and G2/M. 

26. The probe of claim 22 wherein the NORF gene is not expressed in at 
least one phase of the cell cycle selected from the group consisting of: log phase, S 

15 phase, and G2/M. 

27. The probe of claim 22 wherein expression of the NORF gene varies by 
a statistically significant difference (greater than 95% confidence level) between any 
two phases of the cell cycle selected from the group consisting of: log phase, S phase, 
and G2/M. 

20 28. The probe of claim 22 \yherein the gene is selected from the group 

consisting of NORF 1, 2. 4, 5, 6, 17, 25, and 27. 

29. The method of claim 17 wherein said step of monitoring expression is 

performed using nucleic acid molecules which are immobilized on a solid support. 

/ 30. The method of claim 29 wherein the nucleic acid molecules are in on 
25 array. 

3 1 . The method of claim 1 9 wherein a probe which comprises a portion of 
the NORF gene is in an array on a solid support. 

32. An array of probes on a solid support wherein at least one probe 
comprises at least 14 contiguous nucleotides of a NORF gene comprising a SAGE tag 

30 as shown in SEQ ID NOS:67.8 11. 

33. The array of claim 32 wherein the at least one NORF gene is involved 
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in cell cycle progression. 

34. The array of claim 32 wherein the NORF gene is selected from the group 
consisting of NORF No. 1, 2, 4, 5, 6, 17, 25, and 27. 

35. The array of claim 32 which comprises at least 100 probes of distinct 
sequence. ~ * 

36. The array of claim 32 which comprises at least 500 probes of distinct 
sequence. 

37. The array of claim 32 which comprises at least 1 ,000 probes of distinct 
sequence. 

38. A method of identifying a candidate drug as a member of a class of 
drugs having a characteristic effect on gene expression in a yeast cell, comprising the 
steps of: 

contacting a yeast cell with a candidate drug; and 
monitoring expression in the yeast cell of at least one NORF gene 
whose expression is affected by the class of drugs, wherein detection of a difference 
in expression of the at least one NORF gene in the yeast cell relative to expression in 
the absence of the candidate drug identifies the candidate drug as a member of the 
class of drugs. 

39. The method of claim 38 wherein the step of monitoring expression is 
performed using nucleic acid molecules which are immobilized on a solid support, 

40. The method of claim 39 wherein the nucleic acid molecules are in an 

array. 

4 1 . The method of claim 38 wherein expression of two or more NORF genes 
is monitored. 

42. The probe of claim 21 which is inunobilized on a solid support. 
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